Code Cell 1 (5%) - Library and data import.¶
In [73]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import s3fs
fs = s3fs.S3FileSystem(anon=False)
s3_bucket_path ='s3://amazon-sagemaker-058264306111-us-east-1-e23504aef6c5/dzd_5l5kah6gnsnq3r/dqqwae05x04zzb/dev/'
df = pd.read_csv(s3_bucket_path+'review.csv')[:2000]
In [76]:
df.head()
Out[76]:
| review_id | user_id | business_id | stars | useful | funny | cool | text | date | text_length | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | KU_O5udG6zpxOg-VcAEodg | mh_-eMZ6K5RLWhZyISBhwA | XQfwVwDr-v0ZS3_CbbE5Xw | 3 | 0 | 0 | 0 | If you decide to eat here, just be aware it is... | 2018-07-07 22:09:11 | 101 |
| 1 | BiTunyQ73aT9WBnpR9DZGw | OyoGAe7OKpv6SyGZT5g77Q | 7ATYjTIgM3jUlt4UM3IypQ | 5 | 1 | 0 | 1 | I've taken a lot of spin classes over the year... | 2012-01-03 15:28:18 | 151 |
| 2 | saUsX_uimxRlCVr67Z4Jig | 8g_iMtfSiwikVnbP2etR0A | YjUWPpI6HXG530lwP-fb2A | 3 | 0 | 0 | 0 | Family diner. Had the buffet. Eclectic assortm... | 2014-02-05 20:30:30 | 55 |
| 3 | AqPFMleE6RsU23_auESxiA | _7bHUi9Uuf5__HHc_Q8guQ | kxX2SOes4o-D3ZQBkiMRfA | 5 | 1 | 0 | 1 | Wow! Yummy, different, delicious. Our favo... | 2015-01-04 00:01:03 | 40 |
| 4 | Sx8TMOWLNuJBWer-0pcmoA | bcjbaE6dDog4jkNY91ncLQ | e4Vwtrqf-wpJfwesgvdgxQ | 4 | 1 | 0 | 1 | Cute interior and owner (?) gave us tour of up... | 2017-01-14 20:54:15 | 94 |
Code Cell 2 (5%) - Calculate and visualize the distribution of review length.¶
In [77]:
df["text_length"] = df["text"].apply(lambda x : len(x.split()))
df.head()
Out[77]:
| review_id | user_id | business_id | stars | useful | funny | cool | text | date | text_length | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | KU_O5udG6zpxOg-VcAEodg | mh_-eMZ6K5RLWhZyISBhwA | XQfwVwDr-v0ZS3_CbbE5Xw | 3 | 0 | 0 | 0 | If you decide to eat here, just be aware it is... | 2018-07-07 22:09:11 | 101 |
| 1 | BiTunyQ73aT9WBnpR9DZGw | OyoGAe7OKpv6SyGZT5g77Q | 7ATYjTIgM3jUlt4UM3IypQ | 5 | 1 | 0 | 1 | I've taken a lot of spin classes over the year... | 2012-01-03 15:28:18 | 151 |
| 2 | saUsX_uimxRlCVr67Z4Jig | 8g_iMtfSiwikVnbP2etR0A | YjUWPpI6HXG530lwP-fb2A | 3 | 0 | 0 | 0 | Family diner. Had the buffet. Eclectic assortm... | 2014-02-05 20:30:30 | 55 |
| 3 | AqPFMleE6RsU23_auESxiA | _7bHUi9Uuf5__HHc_Q8guQ | kxX2SOes4o-D3ZQBkiMRfA | 5 | 1 | 0 | 1 | Wow! Yummy, different, delicious. Our favo... | 2015-01-04 00:01:03 | 40 |
| 4 | Sx8TMOWLNuJBWer-0pcmoA | bcjbaE6dDog4jkNY91ncLQ | e4Vwtrqf-wpJfwesgvdgxQ | 4 | 1 | 0 | 1 | Cute interior and owner (?) gave us tour of up... | 2017-01-14 20:54:15 | 94 |
In [78]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.displot(df.text_length, kde=False)
Out[78]:
<seaborn.axisgrid.FacetGrid at 0x7f1ba45c38d0>
Code Cell 3 (10%) - Build a BERTopic Model with UMAP (target dimension of 5, n_neighbors of 15, and the usage of cosine similarity to measure the distance).¶
In [79]:
import os
os.environ['TOKENIZERS_PARALLELISM']='true'
In [80]:
%%time
%matplotlib notebook
%matplotlib inline
CPU times: user 1.08 ms, sys: 49 μs, total: 1.13 ms Wall time: 1.16 ms
In [81]:
from bertopic import BERTopic
from umap import UMAP
In [82]:
#Initiate UMAP
umap_model = UMAP(n_neighbors=15,
n_components=5,
min_dist=0.0,
metric='cosine',
random_state=100)
In [83]:
# Initiate BERTopic
model = BERTopic(umap_model=umap_model, language="english", calculate_probabilities=True)
In [84]:
# Run BERTopic model
text_topics, probabilities = model.fit_transform(df.text)
Code Cell 4 (5%) - Visualize the most relevant words for the top 10 topics¶
In [85]:
model.get_topic_info().head(10)
Out[85]:
| Topic | Count | Name | Representation | Representative_Docs | |
|---|---|---|---|---|---|
| 0 | -1 | 659 | -1_the_and_was_to | [the, and, was, to, of, is, for, it, in, my] | [I went here on a Wednesday night and, though ... |
| 1 | 0 | 159 | 0_the_and_to_of | [the, and, to, of, bar, is, you, tour, beer, in] | [I've been driving by this location for severa... |
| 2 | 1 | 147 | 1_to_and_we_the | [to, and, we, the, was, our, for, that, this, it] | [Never been to this location before, never had... |
| 3 | 2 | 76 | 2_tacos_the_and_mexican | [tacos, the, and, mexican, is, taco, to, salsa... | [I gave it one star because it doesn't give me... |
| 4 | 3 | 72 | 3_car_to_my_and | [car, to, my, and, the, me, on, they, in, it] | [I had the worst experience of my life and it'... |
| 5 | 4 | 68 | 4_pizza_the_and_is | [pizza, the, and, is, it, was, to, crust, but,... | [Buckingham Pizza makes the best pizza around,... |
| 6 | 5 | 67 | 5_hotel_room_the_and | [hotel, room, the, and, to, was, in, stay, of,... | [Too big, too expensive and too far from Downt... |
| 7 | 6 | 60 | 6_cake_cream_ice_the | [cake, cream, ice, the, chocolate, of, and, to... | [This place is absolutely amazing. A cute litt... |
| 8 | 7 | 60 | 7_crab_the_seafood_was | [crab, the, seafood, was, and, it, of, we, wer... | [We were a bit weary about trying the Shellfis... |
| 9 | 8 | 53 | 8_sushi_roll_rolls_the | [sushi, roll, rolls, the, and, for, is, it, to... | [This is my favorite in Delaware. The menu is... |
In [86]:
fig = model.visualize_barchart(top_n_topics=10)
fig.show()
Code Cell 5 (5%) - Visualize the topic hierarchy for the top 30 topics¶
In [100]:
fig = model.visualize_hierarchy(top_n_topics=30)
fig.show()
Code Cell 6 (5%) - Reduce the number of topics to 15, visualize the most relevant words for each topic and the topic hierarchy¶
In [101]:
fig = model.visualize_barchart(top_n_topics=15)
fig.show()
In [102]:
fig = model.visualize_hierarchy(top_n_topics=15)
fig.show()
Code Cell 7 (5%) - Get the topic allocation of the first 5 reviews in the data¶
In [103]:
for i, review in enumerate(df.text[:5]):
print(f"Review {i+1}: {review}")
print(f"Assigned Topic: {text_topics[i]}\n")
Review 1: If you decide to eat here, just be aware it is going to take about 2 hours from beginning to end. We have tried it multiple times, because I want to like it! I have been to it's other locations in NJ and never had a bad experience. The food is good, but it takes a very long time to come out. The waitstaff is very young, but usually pleasant. We have just had too many experiences where we spent way too long waiting. We usually opt for another diner or restaurant on the weekends, in order to be done quicker. Assigned Topic: 4 Review 2: I've taken a lot of spin classes over the years, and nothing compares to the classes at Body Cycle. From the nice, clean space and amazing bikes, to the welcoming and motivating instructors, every class is a top notch work out. For anyone who struggles to fit workouts in, the online scheduling system makes it easy to plan ahead (and there's no need to line up way in advanced like many gyms make you do). There is no way I can write this review without giving Russell, the owner of Body Cycle, a shout out. Russell's passion for fitness and cycling is so evident, as is his desire for all of his clients to succeed. He is always dropping in to classes to check in/provide encouragement, and is open to ideas and recommendations from anyone. Russell always wears a smile on his face, even when he's kicking your butt in class! Assigned Topic: 22 Review 3: Family diner. Had the buffet. Eclectic assortment: a large chicken leg, fried jalapeño, tamale, two rolled grape leaves, fresh melon. All good. Lots of Mexican choices there. Also has a menu with breakfast served all day long. Friendly, attentive staff. Good place for a casual relaxed meal with no expectations. Next to the Clarion Hotel. Assigned Topic: 2 Review 4: Wow! Yummy, different, delicious. Our favorite is the lamb curry and korma. With 10 different kinds of naan!!! Don't let the outside deter you (because we almost changed our minds)...go in and try something new! You'll be glad you did! Assigned Topic: 24 Review 5: Cute interior and owner (?) gave us tour of upcoming patio/rooftop area which will be great on beautiful days like today. Cheese curds were very good and very filling. Really like that sandwiches come w salad, esp after eating too many curds! Had the onion, gruyere, tomato sandwich. Wasn't too much cheese which I liked. Needed something else...pepper jelly maybe. Would like to see more menu options added such as salads w fun cheeses. Lots of beer and wine as well as limited cocktails. Next time I will try one of the draft wines. Assigned Topic: -1
Code Cell 8 (5%) - Use get_topic() to show most relevant words for the first topic¶
In [104]:
model.get_topic(0)
Out[104]:
[('the', 0.025793113330711288),
('and', 0.022584997480714086),
('to', 0.021849397450577245),
('of', 0.01912560837494831),
('bar', 0.019078360669966935),
('is', 0.017255142529489294),
('you', 0.016854442292338706),
('tour', 0.016117544900206723),
('beer', 0.015992416388116364),
('in', 0.015709070812270933)]
Text Cell 9 (5%) - Interpret each topic: What is the theme of each topic?¶
Text Cell 9 (5%) - Interpret each topic: What is the theme of each topic?
# Topic Analysis Summary
- Topic 0: Consists mainly of common stopwords like "the," "and," and "to," indicating general language patterns rather than a distinct subject. This suggests a need for additional stopword removal during preprocessing.
- Topic 1: Focuses on ordering processes, with terms like "order" and pronouns such as "we" and "my," suggesting personal accounts of purchase experiences.
- Topic 2: Dominated by the term "pizza," indicating discussions about pizza quality, taste, and overall satisfaction.
- Topic 3: Contains words like "mask," "masks," and "covid," pointing to reviews about pandemic-related safety measures, particularly mask policies.
- Topic 4: Features terms like "bagels" and "donuts," signifying a focus on bakery or breakfast items and customer experiences with them.
- Topic 5: Highlights the word "room," suggesting discussions about restaurant ambiance, layout, and overall atmosphere.
- Topic 6: Strongly linked to submarine sandwiches, with words like "subway," "mikes," "jersey," and "subs," indicating reviews about fast-food sandwich chains and their offerings.
Code Cell 10 (5%) - Visualize the topic frequency of the top 6 topics over time with the entire dataset.¶
In [105]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2000 entries, 0 to 1999 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 review_id 2000 non-null object 1 user_id 2000 non-null object 2 business_id 2000 non-null object 3 stars 2000 non-null int64 4 useful 2000 non-null int64 5 funny 2000 non-null int64 6 cool 2000 non-null int64 7 text 2000 non-null object 8 date 2000 non-null datetime64[ns] 9 text_length 2000 non-null int64 10 topic 2000 non-null int64 dtypes: datetime64[ns](1), int64(6), object(4) memory usage: 172.0+ KB
In [106]:
df.date = pd.to_datetime(df.date)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2000 entries, 0 to 1999 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 review_id 2000 non-null object 1 user_id 2000 non-null object 2 business_id 2000 non-null object 3 stars 2000 non-null int64 4 useful 2000 non-null int64 5 funny 2000 non-null int64 6 cool 2000 non-null int64 7 text 2000 non-null object 8 date 2000 non-null datetime64[ns] 9 text_length 2000 non-null int64 10 topic 2000 non-null int64 dtypes: datetime64[ns](1), int64(6), object(4) memory usage: 172.0+ KB
In [115]:
top_6_topics = model.get_topic_info()['Topic'][1:7]
topics_over_time = model.topics_over_time(
df['text'],
timestamps=df['date'],
nr_bins=50 # Adjust the number of bins as needed
)
# Visualize topics over time
fig = model.visualize_topics_over_time(topics_over_time, topics=top_6_topics)
fig.show()
Code Cell 11 (5%) - Visualize the topics per star rating (e.g., 1, 2, 3, 4, 5).¶
In [117]:
fig = model.visualize_barchart(top_n_topics=6, topics=top_6_topics)
fig.show()
Text Cell 12 (10%) - Comment on your findings based on the visualized topic frequency over time and topics per star rating in Code cells 10 and 11. In general, what are the major topics mentioned about good and bad restaurant experiences, respectively? What are the business implications?¶
Analysis of Topic Frequency Over Time and Star Ratings¶
Findings from Topic Frequency Over Time (Code Cell 10)¶
- COVID-19 discussions (Topic 3) spiked during heightened restrictions, reflecting customer concerns about health and safety.
- Food-related topics (e.g., pizza in Topic 2, bagels/donuts in Topic 4) remained stable, indicating consistent customer interest.
- Restaurant ambiance (Topic 5) fluctuated seasonally, potentially influenced by holidays, renovations, or events.
Insights from Topics by Star Rating (Code Cell 11)¶
- High ratings (4-5 stars): Highlighted food quality, menu variety, and a pleasant dining atmosphere.
- Low ratings (1-2 stars): Focused on service issues (e.g., long wait times, incorrect orders), cleanliness concerns, and uncomfortable dining conditions.
Major Themes in Positive vs. Negative Restaurant Experiences¶
- Positive experiences: Customers value well-prepared food (pizza, bagels, sandwiches), friendly service, and a comfortable atmosphere.
- Negative experiences: Complaints center on poor service, hygiene concerns, and inadequate dining conditions (overcrowding, poor maintenance).
Business Implications¶
- Service Enhancements: Addressing slow service and order mistakes through staff training and workflow improvements can boost satisfaction.
- Marketing Opportunities: Popular items like pizza and bagels can be leveraged for promotions and loyalty programs.
- Health & Safety: Ongoing concerns about COVID-19 emphasize the importance of maintaining strong safety protocols.
- Ambiance & Layout Optimization: Improving seating, decor, and crowd management can enhance the dining experience.
Conclusion¶
Tracking topic trends over time and by rating helps businesses identify strengths (popular menu items, ambiance) and address weaknesses (service and cleanliness issues), ultimately improving customer satisfaction and business success.
Code Cell 13 (10%) - Specify a few topics in the restaurant reviews and use zero shot topic modeling to create a topic model¶
In [118]:
from bertopic.representation import KeyBERTInspired
zeroshot_topic_list = ["food quality", "service", "ambience", "price", "menu variety"]
# We fit our model using the zero-shot topics
# and we define a minimum similarity. For each document,
# if the similarity does not exceed that value, it will be used
# for clustering instead.
topic_model = BERTopic(
embedding_model="thenlper/gte-small",
min_topic_size=5,
zeroshot_topic_list=zeroshot_topic_list,
zeroshot_min_similarity=.75,
representation_model=KeyBERTInspired()
)
topics, _ = topic_model.fit_transform(df.text)
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Code Cell 14 (5%) - Visualize the most relevant words for the top 10 topics and the topic hierarchy for the zero shot topic model¶
In [119]:
topic_model.get_topic_info().head(10)
Out[119]:
| Topic | Count | Name | Representation | Representative_Docs | |
|---|---|---|---|---|---|
| 0 | 0 | 1203 | 0_restaurant_food_delicious_burger | [restaurant, food, delicious, burger, bar, piz... | [It is typical Broad Ripple charming spot. Sma... |
| 1 | 1 | 447 | 1_customer_service_experience_place | [customer, service, experience, place, staff, ... | [I first found out about this adorable little ... |
| 2 | 2 | 156 | 2_restaurant_tasted_food_tasteless | [restaurant, tasted, food, tasteless, chicken,... | [After a long hiatus from reviewing I have awa... |
| 3 | 3 | 100 | 3_hotel_rooms_stay_room | [hotel, rooms, stay, room, tour, amazing, open... | [I absolutely love this hotel. It is certainly... |
| 4 | 4 | 94 | 4_room_hotel_money_place | [room, hotel, money, place, went, more, price,... | [Its a rarity that i give five stars for anyth... |
In [120]:
fig = topic_model.visualize_barchart(top_n_topics=10)
fig.show()
In [122]:
fig = topic_model.visualize_hierarchy(top_n_topics=10)
fig.show()
Text Cell 15 (5%) - Interpret each topic in the zero shot topic model: what is each topic about?¶
Text Cell 15 (5%) - Interpret Each Topic in the Zero-Shot Topic Model¶
Topic Interpretations¶
- Topic 11: Mole and Dish Reviews – Focuses on Mexican cuisine, particularly mole dishes, with reviews discussing taste, authenticity, or preparation.
- Topic 13: Craft Beer and Themed Experiences – Mentions of "growlers" and "firetruck" suggest craft beer discussions and themed restaurant experiences.
- Topic 6: Sub Sandwich Chains – Reviews compare chains like Subway and Jersey Mike’s, discussing taste, service, and pricing.
- Topic 4: Bagels and Bakery Items – Focuses on breakfast items, with reviews highlighting quality, freshness, and variety.
- Topic 0: Miscellaneous / Unclear Topic – Contains common words without a clear theme, possibly due to minimal preprocessing.
- Topic 2: Pizza Reviews – Discusses pizza quality, flavor, and overall dining experience.
- Topic 3: COVID-19 Safety Measures – Reviews mention mask policies and hygiene practices.
- Topic 1: General / Unstructured Topic – Similar to Topic 0, consisting of filler words with no strong theme.
- Topic 5: Restaurant Ambiance and Seating – Focuses on dining environment, including seating, decor, and atmosphere.
- Topic 9: Discussions on Racial Issues – Mentions of "racist" and "white" suggest discussions on discrimination or related experiences.
- Topic 8: Fun and Entertainment – Covers entertainment-based dining experiences like bowling and golf.
- Topic 10: Gas Station or Convenience Dining – Mentions "qt_gas_pumps," indicating reviews on restaurants near gas stations or convenience stores.
- Topic 12: Employee and Workplace Concerns – Discusses work conditions, employee treatment, and hygiene standards.
- Topic 7: Spanish-Language or Cultural Context – Mentions Spanish phrases, possibly indicating Spanish-language reviews or cultural dining elements.
By analyzing these topics, businesses can identify customer concerns, popular menu items, and areas for improvement. Reviewing individual reviews can further refine these insights.
Text Cell 16 (5%) - Acknowledge if you have used any GenAI tools in this assignment and anyone you have worked together with on this assignment.¶
I referred to Professor's code and used chatgpt for interpretations.
Code cell 17 (5%) - Render HTML output of this Python notebook¶
In [1]:
!jupyter nbconvert "Lab Assignment 5/LA5_ShahRhythm.ipynb" --to html
[NbConvertApp] WARNING | pattern 'Lab Assignment 5/LA5_ShahRhythm.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
to various other formats.
WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.
Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
<cmd> --help-all
--debug
set log level to logging.DEBUG (maximize logging output)
Equivalent to: [--Application.log_level=10]
--show-config
Show the application's configuration (human-readable format)
Equivalent to: [--Application.show_config=True]
--show-config-json
Show the application's configuration (json format)
Equivalent to: [--Application.show_config_json=True]
--generate-config
generate default config file
Equivalent to: [--JupyterApp.generate_config=True]
-y
Answer yes to any questions instead of prompting.
Equivalent to: [--JupyterApp.answer_yes=True]
--execute
Execute the notebook prior to export.
Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
Write notebook output to stdout instead of files.
Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
Run nbconvert in place, overwriting the existing notebook (only
relevant when converting to notebook format)
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
Clear output of current file and save in place,
overwriting the existing notebook.
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--coalesce-streams
Coalesce consecutive stdout and stderr outputs into one stream (within each cell).
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --CoalesceStreamsPreprocessor.enabled=True]
--no-prompt
Exclude input and output prompts from converted document.
Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
Exclude input cells and output prompts from converted document.
This mode is ideal for generating code-free reports.
Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
Whether to allow downloading chromium if no suitable version is found on the system.
Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
Disable chromium security sandbox when converting to PDF..
Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
Shows code input. This flag is only useful for dejavu users.
Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
Whether the HTML in Markdown cells and cell outputs should be sanitized..
Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
Set the log level by value or name.
Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
Default: 30
Equivalent to: [--Application.log_level]
--config=<Unicode>
Full path of a config file.
Default: ''
Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
The export format to be used, either one of the built-in formats
['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf']
or a dotted object name that represents the import path for an
``Exporter`` class
Default: ''
Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
Name of the template to use
Default: ''
Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
Name of the template file to use
Default: None
Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
as prebuilt extension for the lab template)
Default: 'light'
Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
Whether the HTML in Markdown cells and cell outputs should be sanitized.This
should be set to True by nbviewer or similar tools.
Default: False
Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
Writer class used to write the
results of the conversion
Default: 'FilesWriter'
Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
PostProcessor class used to write the
results of the conversion
Default: ''
Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
Overwrite base name use for output files.
Supports pattern replacements '{notebook_name}'.
Default: '{notebook_name}'
Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
Directory to write output(s) to. Defaults
to output to the directory of each notebook. To recover
previous default behaviour (outputting to the current
working directory) use . as the flag value.
Default: ''
Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
The URL prefix for reveal.js (version 3.x).
This defaults to the reveal CDN, but can be any url pointing to a copy
of reveal.js.
For speaker notes to work, this must be a relative path to a local
copy of reveal.js: e.g., "reveal.js".
If a relative path is given, it must be a subdirectory of the
current directory (from which the server is run).
See the usage documentation
(https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
for more details.
Default: ''
Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
The nbformat version to write.
Use this to downgrade notebooks.
Choices: any of [1, 2, 3, 4]
Default: 4
Equivalent to: [--NotebookExporter.nbformat_version]
Examples
--------
The simplest way to use nbconvert is
> jupyter nbconvert mynotebook.ipynb --to html
Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf'].
> jupyter nbconvert --to latex mynotebook.ipynb
Both HTML and LaTeX support multiple output templates. LaTeX includes
'base', 'article' and 'report'. HTML includes 'basic', 'lab' and
'classic'. You can specify the flavor of the format used.
> jupyter nbconvert --to html --template lab mynotebook.ipynb
You can also pipe the output to stdout, rather than a file
> jupyter nbconvert mynotebook.ipynb --stdout
PDF is generated via latex
> jupyter nbconvert mynotebook.ipynb --to pdf
You can get (and serve) a Reveal.js-powered slideshow
> jupyter nbconvert myslides.ipynb --to slides --post serve
Multiple notebooks can be given at the command line in a couple of
different ways:
> jupyter nbconvert notebook*.ipynb
> jupyter nbconvert notebook1.ipynb notebook2.ipynb
or you can specify the notebooks list in a config file, containing::
c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
> jupyter nbconvert --config mycfg.py
To see all available configurables, use `--help-all`.
In [ ]: